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Abstract 

The aim of the paper is to provide an exact ap¬ 
proach for generating a Poisson process sam¬ 
pled from a hierarchical CRM, without having 
to instantiate the infinitely many atoms of the 
random measures. We use completely random 
measures (CRM) and hierarchical CRM to de¬ 
fine a prior for Poisson processes. We derive 
the marginal distribution of the resultant point 
process, when the underlying CRM is marginal¬ 
ized out. Using well known properties unique 
to Poisson processes, we were able to derive an 
exact approach for instantiating a Poisson pro¬ 
cess with a hierarchical CRM prior. Furthermore, 
we derive Gibbs sampling strategies for hierar¬ 
chical CRM models based on Chinese restau¬ 
rant franchise sampling scheme. As an example, 
we present the sum of generalized gamma pro¬ 
cess (SGGP), and show its application in topic¬ 
modelling. We show that one can determine the 
power-law behaviour of the topics and words in 
a Bayesian fashion, by defining a prior on the pa¬ 
rameters of SGGP. 


1. Introduction 

Mixed membership modelling is the problem of assigning 
an object to multiple latent classes/features simultaneously. 
Depending upon the problem, one can allow a single latent 
feature to be exhibited single or multiple times by the ob¬ 
ject. For instance, a document may comprise several top¬ 
ics, with each topic occurring in the document with vari¬ 
able multiplicity. The corresponding problem of mapping 
the words of a document to topics, is referred to as topic 
modelling. 

While parametric solutions to mixed membership mod- 
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elling have been available in literature since more than a 
decade (Landauer & Dumais, 1997; Hofmann, 1999; Blei 
et ak, 2001), the first non-parametric approach, that al¬ 
lowed the number of latent classes to be determined as 
well, was the hierarchical Dirichlet process (HDP) (Teh 
et ak, 2006). Both the approaches model the object as a 
set of repeated draws from an object-specific distribution, 
whereby the object specific distribution is itself sampled 
from a common distribution. On the other hand, recent ap¬ 
proaches such as hierarchical beta-negative binomial pro¬ 
cess (Zhou et ak, 2012; Broderick et ak, 2015) and hier¬ 
archical gamma-Poisson process (Titsias, 2008; Zhou & 
Garin, 2015) model the object as a point process, sampled 
from an object specific random measure, which is itself 
sampled from a common random measure. In some sense, 
these approaches are more natural for mixed membership 
modelling, since they model the object as a single entity 
rather than as a sequence of draws from a distribution. 

A straightforward implementation of any of the above non- 
parametric models would require sampling the atoms in the 
non-parametric distribution for the base as well as object- 
specific measure. However, since the number of atoms in 
these distributions are often infinite, a truncation step is re¬ 
quired to ensure tractability. Alternatively, for the HDP, 
a Chinese restaurant franchise scheme (Teh et ak, 2006) 
can be used for collapsed inference in the model (that is, 
without explicitly instantiating the atoms). Fully collapsed 
inference scheme has also been proposed for beta-negative 
binomial process (BNBP) (Heaukulani & Roy, 2013; Zhou, 

2014) and Gamma-Gamma-Poisson process (Zhou et ak, 

2015) . Of particular relevance is the work by Roy (2014), 
whereby a Chinese restaurant fanchise scheme has been 
proposed for hierarchies of beta proceses (and its gener¬ 
alizations), when coupled with Bernoulli process. 

In this paper, it is our aim to extend fully collapsed 
sampling so as to allow any completely random measure 
(CRM) for the choice of base and object-specific measure. 
As proposed in Roy (2014) for hierarchies of generalized 
beta processes, we propose Chinese restaurant franchise 
schemes for hierarchies of CRMs, when coupled with Pois- 





On collapsed representation of hierarchical Completely Random Measures 


son process. We hope that this will encourage the use of 
hierarchical random measures, other than HOP and BNBR, 
for mixed-membership modelling and will lead to further 
research into an understanding of the applicability of the 
various random measures. To give an idea about the flex¬ 
ibility that can be obtained by using other measures, we 
propose the sum of generalized gamma process (SGGP), 
which allows one to determine the power term in the power- 
law distribution of topics with documents, by defining a 
prior on the parameters of SGGP. Alternatively, one can 
also define a prior directly on the discount parameter. 

The main contributions in this paper are as follows: 

• We derive marginal distributions of Poisson process, 
when coupled with CRMs, 

• We provide an exact approach for generating a Pois¬ 
son process sampled from a hierarchical CRM, with¬ 
out having to instantiate the infinitely many atoms of 
the random measure. 

• We provide a Gibbs sampling approach for sampling 
a Poisson process from a hierarchical CRM. 

• In the experiments section, we propose the sum of 
generalized gamma process (SGGP), and show its ap¬ 
plicability for topic-modelling. By defining a prior 
on the parameters of SGGP, one can determine the 
power-law distribution of the topics and words in a 
Bayesian fashion. 

2. Preliminaries and background 

In this section, we fix the notation and recall a few well 
known results from the theory of point processes. 

2.1. Poisson process 

Let {S, 5) be a measurable space and 11 be a random count¬ 
able collection of points on S. Let N{A) = |nn A|, for any 
measurable set A. N is also known as the counting process 
of n. n is called a Poisson process if N{A) is indepen¬ 
dent of N{B), whenever A and B are disjoint measurable 
sets, and N{A) is Poisson distributed with mean /r(A) for 
a fixed cr-finite measure p. In sequel, we refer to both the 
random collection 11 and its counting process N as Poisson 
process. 

Let (T, T) be another measurable space and f ■. S ^ T 
be a measurable function. If the push forward measure 
of fj, via /, that is, /r o f~^ is non-atomic, then /(II) = 
{/(a;) : a: G n} is also a Poisson process with mean mea¬ 
sure /i o /“i. This is also known as the mapping proposi¬ 
tion for Poisson processes (Kingman, 1992). Moreover, if 
Hi, 112,... is a countable collection of independent Pois¬ 
son processes with mean measures pi, /i 2 , • ■ • respectively. 


then the union II = U^^^IIi is also a Poisson process with 
mean measure /i = This is known as the su¬ 

perposition proposition. Equivalently, if Ni is the counting 
process of Hi, then N = i^ counting process 

of a Poisson process with mean measure ^ 

Finally, let p be a measurable function from S to K, and 
E = Campbell’s proposition (Kingman, 

1992), E is absolutely convergent with probability, if and 
only if 

/ min(|p(a;)|, l)p(da;) < oo. (1) 

Js 

If this condition holds, then for any t > 0, 

E[e“*^] = exp y (1 — j- . (2) 

2.2. Completely random measures 

Let (n, J', P) be some probability space. Let {M{S),B) 
be the space of all cr-finite measures on {S,S) supplied 
with an appropriate a-algebra. A completely random mea¬ 
sure (CRM) A on {S, S), is a measurable mapping from 12 
to M{S) such that 

1. P{A(0) = 0} = 1, 

2. For any disjoint countable collection of sets 

Ai,A 2 , ..., the random variables A(Ai), i = 1,2,... 
are independent, and A(UAi) = holds 

almost surely, (the independent increments property) 

An important characterization of CRMs in terms of Poisson 
processes is as follows (Kingman, 1967). For any CRM A 
on (S', S) without any fixed atoms or deterministic compo¬ 
nent, there exists a Poisson process N on (K'*' x S, Sr+ 0 
S), such that A(da;) = zN{dz,dx). Using Campbell’s 
proposition, the Laplace transform of A(A) for a measur¬ 
able set A, is given by the following formula: 

]gJg-*A(A)] _ f f dx)^ , t>0, 

\ JR+xA ) 

(3) 

where v denotes the mean measure of the underlying Pois¬ 
son process N. v is also referred to as the Poisson intensity 
measure of A. If i/{dz,dx) = p{dz)fj,{dx), for a cr-finite 
measure p on S, and a cr-finite measure p on K'*' that sat¬ 
isfies /]j+(l — e~*'^)p{dz) < oo, then A(.) is known as 
homogenous CRM. In sequel, we assume p{.) to be finite. 
Moreover, unless specified, whenever we refer to CRM, it 
means a homogeneous completely random measure with¬ 
out any fixed atoms or deterministic component. 

Let N be the Poisson process of the CRM A, that is, 
A(dx) = sA^(d 2 :,dx). If II is the random collec¬ 

tion of points corresponding to N, then A can equivalently 
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be written as A = ^ n} 

constitute the weights of the CRM A. By the mapping 
proposition for Poisson processes, they form a Poisson pro¬ 
cess with mean measure /j,*(dz) = /i o where 

f{x,y) = a; is the projection map on K+. Hence, the 
weights of A form a Poisson process on K+ with mean 
measure /r*(dz) = ^{dZjS) = p{dz)fi{S). We formally 
state this result below. 

Lemma 2.1. The weights of a homogenous CRM with no 
atoms or deterministic component, whose Poisson intensity 
measure v(dz, dx) = p{dz)pL{dx) form a Poisson process 
with mean measure p{dz)p,{S). 

Note 1: A completely random measure without any fixed 
atoms or deterministic component is a purely-atomic ran¬ 
dom measure. 

Note 2: Every such homogeneous CRM A on {S,S) has 
an underlying Poisson process N on (K+ x S, Sr+ 0 S), 
such that 

A(da;) = f zN{dz, da;) (4) 

JtS.+ 

almost surely. 

3. The proposed model 

Let Xi,, Xn be n observed samples, for instance, n 
documents. We assume that each sample Xi is generated 
as follows: 

• The base measure $ is CRM(p, p), where p and p are 
(T-finite and finite (non-atomic) measures on {S, S) re¬ 
spectively. 

• Object specific measures A^,! < i < n are 

CRM(p, $), where p is another cr-finite non-atomic 
measure on {S,S). 

• The latent feature set Ni for each object Xi is a Pois¬ 
son process with mean measure A^. 

• Finally, the visible features Xi are sampled from Ni. 

Note: For topic modelling, S corresponds to the space of 
all probability measures on the words in the dictionary, also 
known as topics. Hence, when we sample T*, we sample a 
subset of topics, along with the weights for those topics. 
This follows from the discreteness of $. Sampling object- 
specific random measures Ai corresponds to sampling the 
document specific weights for all the topics in T*. Sampling 
the latent features Ni then corresponds to selecting a subset 
of topics from Ai based on the corresponding document- 
specific weights. Since, all the A's have access to the same 
set of topics, this leads to sharing of topics among NiS. 
Finally, the words in Xi is sampled from the corresponding 
topic in Ni using categorical distribution. 


Our aim is to infer the latent features Ni, 1 < i < n from 
Xi, 1 < i < n. By Bayes’ rule 

P(Ni,...,N^lXi,...,X„)x 
F(Xi, ...,X„INi,..., N„)F(Ni, ...,N^) 

= n^l,F(X,lN,)F(Ni,...,N^) 

The conditional distribution of Xi given Ni are often very 
simple to compute, for instance, in the case of topic mod¬ 
elling, it is simply the product of categorical distributions. 
Hence, all we need to compute is the prior distribution of 
the latent features Ni,..., N^. This can be obtained by 
marginalizing out the base and object-specific random mea¬ 
sures 4) and Ai,l < i < n. This is what we wish to achieve 
in the next few sections. 

We will address the problem of marginalizing out the base 
and object-specific random measures in two steps. Firstly, 
in section 3.1, we will derive results for the case when the 
base measure is held fixed and the object-specific random 
measure is marginalized out. Next, in section 3.2, we will 
derive results for the case, when the base random measure 
T* is also marginalized out. All the proofs are provided in 
the appendix. 

3.1. Marginalizing out the object specific measure 

Let 0 be a realization of the base random measure T*. Let 
Ai, 1 < i < n, he independent CRM(p, c/i). It is straight¬ 
forward to see that if (/) is a finite measure A^s will almost- 
surely be finite. Because of the independence among A^s, 
we can focus on marginalizing out a single object-specific 
random measure, say A. Although, in our original formula¬ 
tion, only 1 object is sampled from its object-specific ran¬ 
dom measure, we will present results for the case when n 
objects, Ni,..., Nn are sampled from the object specific 
random measure. This extended result will be needed in 
the next section when marginalizing the base measure. 

There are several ways to instantiate the random measure 
A. For instance, one can use the fact that since the un¬ 
derlying base measure f is purely-atomic, the support of 
CRM(p, f) will be restricted to only those measures whose 
support is a subset of the support of f. In particular, if 
^ = Sill then A will be of the form ^^4^, 

where Lj are independent random variables. The indepen¬ 
dence of LjS follows from the complete randomness of the 
measure. 

However, we found that this approach doesn’t lead us far. 
Hence, we derive the marginal distribution of the Poisson 
processes Ni,..., Nn in proposition 3.1 and 3.2, by first 
assuming to be a continuous measure and then generaliz¬ 
ing it to the case where f is any finite measure. 

In the sequel, ij;(t) = /g+(l — e~*^)p{dz), and is the 
derivative of ip. 
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Proposition 3.1. Let A be a CRM on [S, S) with Poisson 
intensity measure /3(dz)/i(da;), where both and p{.) 
are non-atomic. Let Ni ,..., -/V„ be n independent Pois¬ 
son process with random mean measure A, and M be the 
distinct points of Ni,l < i < n. Then, M is a Poisson 
process with mean measure E[M(da;)] = p,{dx) /]j+(l — 
e-"^)p(dz). 

The above proposition provides the distribution of distinct 
points of the n point processes, Ni,, Nn. In order to 
complete the description of the distribution of A^i,..., Nn, 
we also need to specify the joint distribution of the counts 
of each distinct feature in each Ni. This distribution is 
referred to as CRM-Poisson distribution in the rest of the 
paper. Let M{S) = k and rriij be the count of the 
distinct feature in the object. Furthermore, let [m.j] 
be the count of the distinct feature for each object and 
['mij]i<i<n,i<j<k be the set of count vectors for the each 
latent feature. 

Proposition 3.2. The joint distribution of the set of count 
vectors for the each latent feature /j) is given by 


get 






(7) 


Next, we generalize these results for the case when f is an 
atomic measure. 

Proposition 3.4. Let Abe a completely random measure 
with Poisson intensity measure i'((lz,dx) = 4>(Ax)p{dz), 
where p is non-atomic. Let N be a Poisson process with 
mean measure A. Then, N can be obtained by sampling 
a Poisson process with mean measure (j)(dx)^l){l), say M, 
and then sampling the count of each feature in M using the 
conditional CRM-Poisson distribution. 


Note: The points in M won’t be distinct anymore, since 
the underlying mean measure is non-atomic. 


P{[mij\(n,k)) = (-I)”"' 


_fc 

nr=iK.)! 




(5) 

where rrii. = m.j = 2^^^! m.. = 

9 = p-iS) and 'fit) = 4+(l - 

e“*^)p(dz) is the Laplace exponent of A, and is 

the derivative ofijj{t). This distribution will be referred 
to as CRM-Poisson{piS), p, n). 


Corollary 3.3. Conditioned on M(S') = k, the set of 
count vectors for the each latent feature [mij](^n,k) dis¬ 
tributed as 


P{['mij](n,k)\M{S) = k) (6) 

n”=iK.)! f(n) 


Note that both and f contain a multiple involving 
pis), which cancels out when they are divided in (6). 
Hence, conditioned on the number of points in the Poisson 
process M, the distribution of the set of counts for each la¬ 
tent feature [rriifp^.k) does not depend on the measure p. 
In sequel, this distribution will be referred to as conditional 
CRM-Poisson(p, n, k) or CCRM-Poisson(p, n, k). 

Example 1: The Gamma-Poisson process 

The Poisson-intensity measure of gamma process is given 
by pidz) = e~^z~^ dz. The corresponding Laplace expo¬ 
nent is tpit) = ln(l -f t). Replacing it in equation (5), we 


3.2. Marginalizing out the base measure 

The previous section derived the marginal distribution of 
the Poisson processes, for a fixed realization f of the base 
random measure 4). In this section, we want to marginal¬ 
ize the CRM <1> as well. Marginalizing $ does away with 
the independence among the latent features NiS, hence, we 
need to model the joint distribution of A^i,..., Nn. 

The model under study is 


$^CRM(p,/r), 

Ai|$ ~CRM(p',$), 1 < i < n, (8) 

Ni\Ai ^ Poisson Process(A^), 1 < i < n . 


We use Proposition 3.4 to marginalize out A^ from the 
above description. Thus Ni can equivalently be ob¬ 
tained by sampling a Poisson processes with mean measure 
4)(da;) (1 — e~^)pidz), and then sampling the count of 

each feature in Mi for each point process Ni using Corol¬ 
lary 3.3. In particular, let Mi be the corresponding Poisson 
process, and rriij be the count of the feature in Mi for 
the point process Ni and r^. = MfS). The reason for 
the symbol will become clear, when we have a picture 
of the entire generative model. Let [rriij ].be the set 
of counts of the latent features for the individual. The 
distribution of the set of counts [rriij ].conditioned on 
AliiS) does not depend on $. Hence, an alternative de¬ 
scription of the Ni via Mi and rriij, 1 < j < r^. is as 
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follows: 


4. Implementation via Gibbs sampling 


Mi 1$ ~ Poisson Process ^$(.) J (1 — e ^)p(dz)^ , 
[fn^j](.^ri.)\{Mi{S) = ri.} CCRM-Poisson(; 0 , l,r,.) 

ri. 

^ ^ rTlij ^Mij j 

i=l 

(9) 

where Mij are the points in the point process Mi. 

Mi, 1 < z < n are independent Poisson processes, whose 
mean measure is a scaled CRM, and hence, also a CRM. 
Hence, we are again in the domain of CRM-Poisson mod¬ 
els. Let z/j(l) = L+(l — e~^)p{dz). If we define 
$'(da;) =z/;(l)$(dx),then 

= exp\-p{A) f (1 - 

I Jr+ 

= exp |-p(A) (1 - e-*^')p{d{z'/m))] 

Hence, the Poisson intensity measure of the scaled CRM $' 
is given by p{d{z/p,{dx). Applying Proposition 3.4 
to marginalize out $, we get that Mi’s can be obtained by 
sampling a Poisson process R with mean measure 

E[i?(da:)] = p{dx) [ (1 - e""^')p(d(z7z/;(l))) 

Jr+ 

= p{dx) f (1 - e-’^(^)”7p(dz). 

Jr+ 

The count of each feature in R for each point process Mi 
can then be obtained by using Corollary 3.3. In particu¬ 
lar, let rik be the count of the point in R for the point 
process Mi andp = R{S). 

A complete generative model for generating the point pro¬ 
cesses Ni, 1 < z < n is as follows: 

R ~ Poisson Process J (1 — , 

[rik](n,p)\ {RiS) =p} CCRM-Poisson(p,z/i(l)n,p) 

( 10 ) 

p 

M, = nkSR, 

k^l 

['^ij]{-,ri.)\{Mi{S) = Pi.} ~ CCRM-Poisson(p, l,ri.) 


Section 3 provided an approach for sampling a Poisson 
process, when sampled from a hierarchical CRM, without 
having to instantiate the infinitely many atoms of the base 
or object-specific CRM. However, it is not clear how the 
above derivations can be used for determining the latent 
features Ni, ..., Nn for the objects Xi, ..., X„, which is 
the aim of this work. 

In this section, we provide a Gibbs sampling approach for 
sampling the latent features from its prior distribution that 
is P{Ni, ..., Nn). In order to sample from the posterior, 
one simply needs to multiply the equations in this section 
with the likelihood of the latent feature. In order to be able 
to perform MCMC sampling in hierarchical CRM-Poisson 
models, we need to marginalize out R{S) and Mi{S) from 
distributions of [rik][n,p) respectively. By 

marginalizing out the Poisson distributed random variable 
R{S) from (10), we get that 

hfe](n.p) ^ CRM-Poisson(/z(S'),p, 7(1)71). 

The marginal distribution of the set of counts of each latent 
feature for the z*^ individual [77Zy](._r,_) (where is also 
random) is given by the following lemma. 

Lemma 4.1. Let 

h{u) = = exp (-m(S') / (1 - e-^^)p{dz) 

I Jr+ 

Furthermore, if we let 

7(u)=/ (l-e-“7p(d2) 

Jr+ 

7(w) = / (1 - e-"7p(d7 , 

JR+ 

then, ['tnij](. ,ri.) is marginally distributed as 

= {-Pr''hP'-^ ( 7 ( 1 )) - — ^ p — 

( 11 ) 

In the case of topic-modelling, the number of latent fea¬ 
tures, f^Ni is equal to the number of observed features 
ifXi. Hence, let Xu be the observed feature associated 
with the z*^ object and Nu be the corresponding latent fea¬ 
ture. Here, we discuss the MCMC approach for sampling 
from the prior distribution of Nu, 1 < I < rrii.. 


Ti. 

Ni — 7 '^, tTlijS]\/[ij ) 
i=i 

Since R is again a Poisson process, it is straightforward to 
extend this hierarchy further by sampling p,{.) again from 
a CRM. 


As discussed in (Neal, 2000), it is more efficient to sample 
the index of the latent feature, rather than the latent feature 
itself. Hence, let Tu be the index of the point in Mi as¬ 
sociated with Nil, and Dij be the index of the point in R 
associated with M^. In an analny with the Chinese restau¬ 
rant franchise model (Teh et al., 2006), one can think of Tu 
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to be the index of the table assigned to the customer in 
the restaurant, and Dij to be the index of the dish asso¬ 
ciated with the table in restaurant. Moreover, 
refers to the number of customers sitting on the table in 
restaurant, and refers to the number of tables in the 
restaurant with the dish. Hence r^. = ’’ifc 

the number of tables in the restaurant. 

The distribution of the number of customers per table in 
the restaurant, follows from Lemma 4.1. 

Hence, in order to sample the table of customer, Tu, 
given the indices of the tables of all the other customers in 
restaurant, we treat it as the table corresponding to the 
last customer of the restaurant. Let be the num- 

ber of customers sitting on the table in the i' restau¬ 
rant, excluding the customer. The probability that the 

customer in the restaurant occupies the table is 

proportional to 1 ^ ^ given 

in (11). We divide the expression by 1 < / < 

ri.) to get a simpler form for the unnormalized probability 
distribution. Hence, the probability of assigning an existing 
table with index j is given by 


P[Ta 


j|T-W) 


(X — 




( 12 ) 


and the probability of sampling a new table for the cus¬ 
tomer is given by 


P{Tii = r,. -f 


where '0(f) = 
derivative of h. 






(13) 


/g+(l — e *^)p{dz) and is the 


Moreover, whenever a new table is sampled for a customer, 
a dish is sampled for the table from the distribution on ta¬ 
bles per dish. By the discussion in the beginning of this 
section, the number of tables per dish \rik]{n^p) follow a 
CRM-Poisson(/x(S'), p, 0(l)n) distribution. Hence, in or¬ 
der to sample the dish at table, Dij, given the indices 
of the dishes at all the other tables, we treat it as the dish 
corresponding to the last table of the last restaurant. Let 
be the total number of tables served with the 
dish, excluding the table of restaurant. The proba¬ 
bility that the dish is served at the table in the 
restaurant is proportional to P{ri,j.]^'^ + li>=ipk'=k, 1 < 
i' < n,l < k' < p) as given in (11). We divide the expres¬ 
sion by P{'r~,]^,^\ 1 < i' < n,l < k' < p) to get a simpler 
form for the unnormalized probability distribution. Hence, 
the probability of serving an existing dish with index k is 


given by 


P{Dij = 0^ _ 


(j. _ 

(V^(l)n) 

('p _ 

Ip {ip{l)n) 


(14) 


and the probability of sampling a new dish for the table is 
given by 

P{D,j=pPl\D-^Py) cx 00W(0(l)n), (15) 

where '0(f) = /g+(l — e“*^)p(d 2 :) and 9 = p{S). 


Hence, a complete description of one iteration of MCMC 
sampling, from the prior distribution, in hierarchical CRM- 
Poisson models is as follows: 


1. For each customer in each restaurant, sample his ta¬ 
ble index conditioned on the indices of table of other 
customers, according to equations (12) and (13). 

2. If the table selected is a new table, sample the index 
of dish corresponding to that table from equations (14) 
and (15). 

3. Sample the index of dish for each table, conditioned 
on the indices of dishes at the other tables, according 
to equations (14) and (15). 


Example 2: The Gamma-Gamma-Poisson process 

We compute the dish and table sampling probabilities for 
the Gamma-Gamma-Poisson process using the above equa¬ 
tions. The Poisson intensity measure for both the base 
and object specific measures $ and A^,! < i < n is 
z~^e~^ dz. The corresponding Laplace exponent is given 
by 0(f) = 0(f) = ln(l -f f). Moreover, let the mean mea¬ 
sure for the base measure $ be /r(.) and p,{S) = 9. Then, 
h(u) = = jj+Tp'' corresponding deriva¬ 

tives are given by 


0(fe) ^ 0(fc)(t) = ^ 1 


m 


{i + ty 

^{k)( ^ {-l)'^r{k + 9) 

^ ’ (1 + u)'=+«r(0) 


(16) 

(17) 


The corresponding dish sampling probabilities are given by 


P{D^J 


cx 


p-di) 

' .k 

1 -f n In 2 


(18) 


P(A,-=P+l|D^fa'))cx ^ (19) 

1 -f n In 2 

for an existing and new dish respectively. Normalizing 
these probabilities, we get 


P(A, = A:|D-(*^)) 


' .k 


E 


P 


r.k + 0 


P(Aj = P+1|D-(*^1) 


E 


p 

fc=i 


9 

r.k + 0 


( 20 ) 

( 21 ) 
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The table sampling probabilities are given by 


5.1. Evaluation 


TO,- 


-w 


( 22 ) 


for an existing and new table respectively. Normalizing 
these probabilities, we get 




=r,. +l|T-(*')) 


Pi. _(ii) e+r,. 

= l l+ln(2) 

(24) 

{0 + Vi. )/(l + ln(2)) 

= l "iy + l+ln(2) 

(25) 


For evaluating the different models, we divide each docu¬ 
ment into a training section and a test section by indepen¬ 
dently sampling a boolean random variable for each word. 
The probability of sending the word to the training section 
is varied from 0.3 to 0.7. We run 2000 iterations of Gibbs 
sampling. The first 500 iterations are discarded, and every 
sample in every 5 iterations afterwards is used to update the 
document-specific distribution on topics and the topic spe¬ 
cific distribution on words. In particular, let W be the num¬ 
ber of words, K be the number of topics, {fidk)i<k<K be 
the document specific distribution on topics for the docu¬ 
ment d, and {Tkw)i<w<w be the topic specific distribution 
on words for the topic. Then, the probability of observ¬ 
ing a word w in document d is given by PdkTkw For 

the evaluation metric, we use perplexity, which is simply 
the inverse of the geometric mean of the probability of all 
the words in the test set. 


Example 3: The Gamma-Generalized Gamma-Poisson 
process 

In this scenario, the base random measure has p{dz) = 
e~^z~^ dz, whereas the object specific measure has 
p{dz) = dz, where 0 < d < 1 is known as 

the discount parameter. The corresponding Laplace expo¬ 
nents are given by tpit) = ln(l + t) and 'ip{t) = 
respectively. The derivative of ■0 is given by 




(_i)fc-ir(fc-d) 

(i + t)fe-dr(i_d) 


(26) 

(27) 


Other derivatives remain same as in the previous example. 
Moreover, the dish sampling probabilities remain same. 
The table sampling probabilities are given by 


P{Tu = 


P{Ta =n.P 1|T-F0) 


— {il) 7 

mP - d 


_r •. 


-(il) 


-d) + 


6+r^. 

1 + 177 ( 2 ) 

(28) 


{9 -f Ti .)/(1 + ln£;(2)) 
P(a) 5+7“ 




l+lnd(2) 

(29) 


where lnd(2) = 


5.2. Varying the Common CRM 

In our experiments, we fix the object specific random mea¬ 
sure Ai in (8) to be the gamma process, with p{dz) = 
e~^z~^ dz. For the base CRM $, we consider two specific 
choices of random measures. 


• Generalized gamma process (GGP): The 

Poisson intensity measure of 4) is given by 
i'{dz,dx) = p{dz)p{dx), where p{dz) = 
dz, 0 < d < 1,0 > 0 and 

p{S) = 1. The corresponding Laplace exponent is 
given by 0((1 -f t)'^ — 1) /d. 

• Sum of Generalized gamma processes (SGGP): 

The Poisson intensity measure of the CRM is given 
by i/{dz, dx) = p{dz)p{dx), where 

m „ 


and p{S) = 1. The corresponding Laplace exponent 
is given by 


tP{t) 





(31) 


5. Experimental results 

We use hierarchical CRM-Poisson models for learning top¬ 
ics from the NIPS corpus ' . 

*The dataset can be downloaded from http: 
//psiexp.ss.uci.edu/research/programs_data/ 
toolbox.htm 


For the case of GGP, the value of the discount parameter 
d is chosen from the set {0, .1, .2, .3, .4}. Furthermore, a 
gamma prior with rate parameter 2 and shape parameter 4 
is defined on 9. 

Note; The generalized gamma process with discount pa¬ 
rameter 0 corresponds to the Gamma process. Using a 
gamma process prior for the base and object-specific CRM 
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corresponds exactly to the hierarchical Dirichlet process 
with a gamma prior on the concentration parameter of the 
object specific Dirichlet process. We did not add compari¬ 
son results with HDP separately, because the same perplex¬ 
ity is obtained in both the models. 

For the case of SGGP, we consider m = 5, and di = 
0, ^2 = .1..., ^5 = .4. Furthermore, independent gamma 
priors with rate parameter 2 and shape parameter 4 are de¬ 
fined for each 6q, 1 < <7 < 5. The posterior of each pa¬ 
rameter dq is sampled via uniform sampling. We use equa¬ 
tions (12)-(15) to compute the dish sampling and table sam¬ 
pling probabilities. The probability of sampling an existing 
dish is given by 


r(i-.,) a+mnr^ 




9 r(i-d,) 


(1 + tp{l)nYi 


6 . Conclusion 

For years, hierarchical Dirichlet processes have been the 
standard tool for nonparametric topic modelling, since col¬ 
lapsed inference in HDP can be performed using the Chi¬ 
nese restaurant franchise scheme. In this paper, our aim 
was to show that collapsed Gibbs sampling can be extended 
to a much larger set of hierarchical random measures us¬ 
ing the same Chinese restaurant franchise scheme, thereby 
opening doors for further research into the efficacy of vari¬ 
ous hierarchical priors. We hope that this will encourage a 
better understanding of applicability of various hierarchical 
CRM priors. Furthermore, the results of the paper can be 
used to prove results for hierarchical CRMs in other con¬ 
texts, for instance, nonparametric hidden Markov models. 
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where = /g+(l — e ^)p{dz) = ln(2). Similarly, the 
probability of a new dish is given by 


P{D,q =p+ l|D-(*^)) oc ^ 9(1 + • 

q=l 

The table-sampling probabilities can be computed simi¬ 
larly. We approximated the Laplace transform of 4)(S') (h 
in (13)), by a weighted sum of exponential functions to 
simplify the computation of its derivatives. The perplex¬ 
ity for the hierarchical CRM-Poisson models as a function 
of training percentage is plotted in Figure 1 . Note that Fig¬ 
ure 1 doesn’t necessarily imply that SGGM-based models 
will always outperform GGM based models as the results 
have been obtained by defining a specific gamma prior for 
each hyperparameter, as mentioned above. 



30 40 50 60 70 
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-■-GGM(0} 

GGM(.2) 
-*-GGM(.3) 
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Figure 1. Variation of perplexity with training percentage for var¬ 
ious hierarchical CRM-Poisson models 
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Appendix 

Proof of proposition 3.1 


Proof Let N = Since, conditioned on A, 

Ni,..., Nn are independent Poisson process with mean 
measure A, by the superposition proposition for Pois¬ 
son processes, is a Poisson process conditioned on A 
with mean measure nA. Since, |A] = 

E[e*i^M)|A(A)]E[e*=^(-®)|A(B)], and A(A) and A{B) 
are independent, hence, N{A) and N{B) are also inde¬ 
pendent and therefore, A^ is a CRM. Hence, A^(d 2 :) = 
/g+ siV(dz,d 2 :), for a Poisson process iV on S' x IR+. 
Moreover, 


E[e-‘^(^)] 


= E 


E[e 


-tN{A) 


|A] 


= E [exp (—nA(A)(l — e *))] 

= exp (^-m(A) (1 - 

.. 

oo 

-E 


\k=0 


k\ 


4nzY 


k=0 


k\ 


p{dz) 


where, we have used the fact that 1 = J2T=o — kV^ 
Rearranging the terms in the above equation, we get 


E[e-*^(^)] 

= exp pil - e-^^) . 


Hence, the Poisson intensity measure of N, when viewed 
as a CRM is given by 

f p~‘^^(nz4 

v{dk,dx) = p,{dx) / --j- p{dz) 

7 r + k\ 


when k G {1,2,3,...}, and 0 otherwise. The distinct 
points of N can be obtained by projecting N on S. Hence, 
by the mapping proposition for Poisson processes (King- 
man, 1992), the distinct points of N form a Poisson process 
with mean measure p*{dx) = P(/“^(da;)), where / is the 
projection map on S. Hence f~^{dx) = (K“'', dx), and 

p*{dx) = da;) 

= pidx) / (1 - e-"^)p(dz). 

7r+ 


Thus, the result follows. 


□ 
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Proof of proposition 3.2 


Proof. The proof relies on the simple fact, that conditioned 
on the number of points to be sampled, the points of a 
Poisson process are independent (Kingman, 1992). Thus, 
n point processes can be sampled from a measure A, by 
hrst sampling the number of points in each point process 
from a Poisson distribution with mean A(5'), and then sam¬ 
pling the points independently. Let A = X]r=i Let 

{Xi^,... ,Xi^) be the features discovered by the n Pois¬ 
son processes. Let the point process Ni consist of mu 
occurrences of Xi^, 771^2 occurrences of Xi^ and mik oc¬ 
currences of X/j,. Then, the joint distribution of the n point 
processes conditioned on A is given by 


P(iVi,...,X„|A) 

Aexp(-T)T^?=i’"- A 
~ mV ) 

where T = A{S) = Xi6xi{S) = A^. Read¬ 

justing the outermost product in the above equation, we get. 


P(iVi,...,iV„|A) = 


exp(—nT) 




n"=,(ELm.i)! 




Since, we are not interested in the actual points Xj/s, but 
only the number of occurrences of the different points in 
the point processes, that is, /j), we can sum over 

every A:-tuple of distinct atoms in the random measure A. 
Hence, 


L’([™zj](n,fc) I A) 

exp(—nT) 


E RA 






where the sum is over all subsets of length k of the set 
{Ai, A 2 ,... }. Finally, in order to compute the result, we 
need to take expectation with respect to the distribution of 
A. Towards that end, we note that only the weights of A 
appear in the above equation. From section 2.2, we know 
that the weights of a CRM with Poisson intensity mea¬ 
sure p(dz)/i(da;)form a Poisson process with mean mea¬ 
sure ijL{S)p{dz). Hence, it is enough to take the expectation 
with respect to the Poisson process. 



)) 




(32) 

— 

1 

E 

exp(- 

nT) 

(33) 


mij)l 



E 


k 

nx 

r=i 

5 

(34) 








The expectation can further be simplified by applying 
Proposition 2.1 of (James, 2005). 

Proposition 6.1 ((James, 2005)). Let Af be the space of 
all a—finite counting measures on K+, equipped with an 
appropriate a-field. Let f : M"*' —>■ and g : Af ^ M’*' 

be measurable with respect to their a-fields. Then, for a 
Poisson process N with mean measure E[X(da:)] = p(da;). 


E 


g{N)e Eaen/(■:^) 


= E 


p E/: 


-/(A) 


E[g{N)] 


where N is a Poisson process with mean measure 
E[X(da;)] = p{dx). 


Applying the above proposition to (32), we get 

E [e-E“i’^A.j 


P{[mij](^n,k)) 


nr=i(E=im.,)! 


X E 


E Ra 


Er=i iTT'ij 


A(j ^Ai2^---5^Ai^ GA: a —1 


where X is a Poisson process with mean measure 
E[X(d 2 :)] = p{dz)9. The hrst expectation can be 

evaluated using Campbell’s proposition and is given by 
exp (—0 /]g+(l — e“”^)p(dx)). In order to evaluate the 
second expectation, we construct a new point process from 
N* on K+ by concatenating every set of k distinct points 
in N. The expression in the second expectation can then be 
rewritten as 


(Aij,...,A,jGAfi = l 


By Campbell’s proposition for point processes. 


f f{z)p{dz), 
zGR+ 

where p{dz) = E[X(dz)]. Moreover, since the point 
process N* is obtained by concatenating distinct points 
in X, E[X*(d 2 i,...,d 2 fc)] = = 

n^=i p{dzj), whenever Zj’s are distinct. Hence, 


E 


E/A) 


.AeN 


E 


E Ra^ 




k „ 

= R/ 0 e-”^z^"=i™^^p(d 2 ). 

j—\d zGR^ 

Hence, the hnal expression for the marginal distribution of 
the set of counts for each latent feature is given by 
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exp(-6>4 +(l-e "^)p(dz)) 

k p 

X TT / 

7=1 dzGR+ 


The above expression can be simplified by letting 
= 6 '/b+(1 - e-*^)p{dz). Hence, = 

(— de~*^p{dz). Hence, the above expression 
can be rewritten as 




-k 


Qk^-Otpin) 


nr=i(E-=im.,)! 


i=i 


□ 


Proof of Corollary 3.3 

Proof. From proposition 3.1, the distinct points in the point 
processes Ni,l < i < n, form a Poisson process with 
mean measure Hence, the total number of dis¬ 

tinct points k is distributed as Poisson('7(n)). Hence, con¬ 
ditioning equation (5) with respect to k, we get the desired 
result. □ 

Proof of Proposition 3.4 

Proof. Let N = From the arguments of propo¬ 

sition 3.1, N is a CRM, and hence, can be written as 
N{dx) = /jj+ ziV(dz,da;) for some Poisson process N. 
Let n be the random collection of points corrsponding to 
N. Now define a map / : IR+ x S' —>■ S' as the projec¬ 
tion map on S, that is, f{x,y) = y and M = /(H) = 
■ i^iU) C H}}, where the double brackets indi¬ 
cate that M is a multiset. The rest of the arguments remain 
same as in proposition 3.1 and proposition 3.2. □ 


Let Wi. = Taking expectation with respect 

to 4>(S), we get the marginal distribution of ., 

where r^. is also random. 


Pi[xnij]l<j<r-, ) 


= E 


exp (^-$(S) ^^(1 - e-^)p{dz)j d>iSY 
p{dz) 


ni=i/,eR+e "z" 


(36) 


It is given that 


Hence 


du 


h{u) = 

Yiu) = / (1 - e-n^dz ), 

Jr+ 

'—h{u) = (-1)’'”E [$(S)’'-e-"‘*’('^) 


JR+ 

Using the above results with u = '!/'(!)> equation (36) can 
be rewritten as 


TT 

= (-l)"‘^-/i("-)(7(l))-1 2 (37) 

rrii.l 

□ 


Proof of Lemma 4.1 


Proof. Using Proposition 3.4 to marginalize from 8, 
we get that [™ij]i<j<r . is distributed as CRM- 
Poisson($(S), p, 1), that is, 

^ exp {-d>{S) 4+ (1 - e-^)p(dz)) 















